{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Custom Retriever\n",
    "Many LLM applications involve retrieving information from external data sources using a Retriever.<br>\n",
    "许多LLM应用程序都涉及使用 Retriever .\n",
    "\n",
    "A retriever is responsible for retrieving a list of relevant Documents to a given user query.<br>\n",
    "检索器负责检索与给定用户 query 相关的 Documents 列表。\n",
    "\n",
    "The retrieved documents are often formatted into prompts that are fed into an LLM, allowing the LLM to use the information in the to generate an appropriate response (e.g., answering a user question based on a knowledge base).<br>\n",
    "检索到的文档通常被格式化为提示，这些提示被输入到一个LLM中，允许使用LLM中的信息来生成适当的响应（例如，根据知识库回答用户问题）。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from typing import List\n",
    "\n",
    "from langchain_core.callbacks import CallbackManagerForRetrieverRun\n",
    "from langchain_core.documents import Document\n",
    "from langchain_core.retrievers import BaseRetriever\n",
    "\n",
    "\n",
    "class ToyRetriever(BaseRetriever):\n",
    "    \"\"\"A toy retriever that contains the top k documents that contain the user query.\n",
    "\n",
    "    This retriever only implements the sync method _get_relevant_documents.\n",
    "\n",
    "    If the retriever were to involve file access or network access, it could benefit\n",
    "    from a native async implementation of `_aget_relevant_documents`.\n",
    "\n",
    "    As usual, with Runnables, there's a default async implementation that's provided\n",
    "    that delegates to the sync implementation running on another thread.\n",
    "    \"\"\"\n",
    "\n",
    "    documents: List[Document]\n",
    "    \"\"\"List of documents to retrieve from.\"\"\"\n",
    "    k: int\n",
    "    \"\"\"Number of top results to return\"\"\"\n",
    "\n",
    "    def _get_relevant_documents(\n",
    "        self, query: str, *, run_manager: CallbackManagerForRetrieverRun\n",
    "    ) -> List[Document]:\n",
    "        \"\"\"Sync implementations for retriever.\"\"\"\n",
    "        matching_documents = []\n",
    "        for document in self.documents:\n",
    "            if len(matching_documents) > self.k:\n",
    "                return matching_documents\n",
    "\n",
    "            if query.lower() in document.page_content.lower():\n",
    "                matching_documents.append(document)\n",
    "        return matching_documents\n",
    "\n",
    "    # Optional: Provide a more efficient native implementation by overriding\n",
    "    # _aget_relevant_documents\n",
    "    # async def _aget_relevant_documents(\n",
    "    #     self, query: str, *, run_manager: AsyncCallbackManagerForRetrieverRun\n",
    "    # ) -> List[Document]:\n",
    "    #     \"\"\"Asynchronously get documents relevant to a query.\n",
    "\n",
    "    #     Args:\n",
    "    #         query: String to find relevant documents for\n",
    "    #         run_manager: The callbacks handler to use\n",
    "\n",
    "    #     Returns:\n",
    "    #         List of relevant documents\n",
    "    #     \"\"\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "documents = [\n",
    "    Document(\n",
    "        page_content=\"Dogs are great companions, known for their loyalty and friendliness.\",\n",
    "        metadata={\"type\": \"dog\", \"trait\": \"loyalty\"},\n",
    "    ),\n",
    "    Document(\n",
    "        page_content=\"Cats are independent pets that often enjoy their own space.\",\n",
    "        metadata={\"type\": \"cat\", \"trait\": \"independence\"},\n",
    "    ),\n",
    "    Document(\n",
    "        page_content=\"Goldfish are popular pets for beginners, requiring relatively simple care.\",\n",
    "        metadata={\"type\": \"fish\", \"trait\": \"low maintenance\"},\n",
    "    ),\n",
    "    Document(\n",
    "        page_content=\"Parrots are intelligent birds capable of mimicking human speech.\",\n",
    "        metadata={\"type\": \"bird\", \"trait\": \"intelligence\"},\n",
    "    ),\n",
    "    Document(\n",
    "        page_content=\"Rabbits are social animals that need plenty of space to hop around.\",\n",
    "        metadata={\"type\": \"rabbit\", \"trait\": \"social\"},\n",
    "    ),\n",
    "]\n",
    "retriever = ToyRetriever(documents=documents, k=3)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "retriever.invoke(\"that\")"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "langchain0_1",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "name": "python",
   "version": "3.11.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
